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In this work we study various notions of uncertainty for angular momentum in the spin-s represen¬ 
tation of SU(2). We characterize the “uncertainty regions” given by all vectors, whose components 
are specified by the variances of the three angular momentum components. A basic feature of this 
set is a lower bound for the sum of the three variances. We give a method for obtaining optimal 
lower bounds for uncertainty regions for general operator triples, and evaluate these for small s. 
Further lower bounds are derived by generalizing the technique by which Robertson obtained his 
state-dependent lower bound. These are optimal for large s, since they are saturated by states taken 
from the Holstein-Primakoff approximation. We show that, for all s, all variances are consistent 
with the so-called vector model, i.e., they can also be realized by a classical probability measure 
on a sphere of radius \J s(s + 1). Entropic uncertainty relations can be discussed similarly, but 
are minimized by different states than those minimizing the variances for small s. For large s the 
Maassen-Uffink bound becomes sharp and we explicitly describe the extremalizing states. 

Measurement uncertainty, as recently discussed by Busch, Lahti and Werner for position and 
momentum, is introduced and a generalized observable (POVM) which minimizes the worst case 
measurement uncertainty of all angular momentum components is explicitly determined, along with 
the minimal uncertainty. The output vectors for the optimal measurement all have the same length 
r(s), where r(s)/s —> 1 as s —> oo. 
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I. INTRODUCTION 

The textbook literature on quantum mechanics seems to agree that the uncertainty relations for angular 
momentum, and indeed for any pair of quantum observables A, B should be given by Robertson’s [26] inequality 

A 2 p (A)A 2 p (B)>±(tvpi[A,B]f, (1) 

valid for any density operator p , with A 2 p (A) denoting the variance of the outcomes of a measurement of A on 
the state p. Perhaps the main reason for the ubiquity of this relation in textbooks is that it is such a convenient 
intermediate step to the proof of uncertainty relations for position and momentum. In that case the right hand 
side is h 2 / 4, independently of the state p. For any pair A, B other than a canonical pair, however, the relation 
(1) makes a much weaker statement, requiring some prior information about the state. This begs the question: 
When and with what bounds it is true that 

[Preparation Uncertainty:] 

One cannot choose a state p so that A 2 (A) and A 2 (B) simultaneously become arbitrarily small. 

Robertson’s relation supports no such conclusion, but on the other hand such a statement does hold in many 
situations. In fact, in a finite dimensional context it is true whenever A and B do not have a common eigenvector. 
In this paper we will provide optimal bounds for angular momentum components, establishing the methods for 
deriving optimal bounds in the general case along the way. 

The second reason that (1) is unsatisfactory is that it addresses only the preparation side of uncertainty, in 
the sense loosely described in the italicized sentence above. However, there is always also a measurement aspect 
to uncertainty, for which Heisenberg’s 7 -ray microscope [9] is a paradigm. The error disturbance tradeoff would 
be stated as 

[Error-Disturbance Uncertainty:] 

An approximate measurement of A of accuracy A A disturbs the system in such a way that from the 
post-measurement state and the measurement result for A the distribution for observable B can only 
be inferred with accuracy A B, where A A and A B cannot be simultaneously arbitrarily small. 

It is often easier to think of the whole experiment as a joint measurement of A and B , and state relations of 
the kind: 

[Measurement Uncertainty:] 

For any measurement device with both an A- type and a B-type output, the marginals will have worst 
case error A A, A B with respect to ideal measurements of A and B, satisfying a tradeoff relation. 

Again, generic observables and angular momenta satisfy non-trivial relations of this kind. Errors A A = A B = 0 
can occur only if A and B commute, i.e., under an even more stringent condition than for preparation uncertainty. 
In this paper we will provide some sharp measurement uncertainty relations for angular momentum, establishing 
along the way some methods which may be of interest in more general cases. 

There is a third reason that one should not be satisfied with (1) with A = L 1 , B = L 2 : It involves only two of 
the three components of angular momentum. But there is no reason tradeoff-relations as described above should 
not be stated for more than two observables. For angular momentum this seems especially natural. Moreover, 
it seems natural to state relations for all components simultaneously, i.e., not only for the three components 
along the axes of an arbitrarily chosen Cartesian reference frame, but for the angular momenta along arbitrary 
rotation axes, restoring the rotational symmetry of the problem. 

Indeed the idea that uncertainty should involve just pairs of observables can be traced to Bohr’s habit of 
expressing complementarity as a relation between “opposite” aspects, like ‘in vitro’ and ‘in vivo’ biology. This 
dualistic preference had more to do with his philosophy than with the actual structure of quantum mechanics. 
Other founding fathers of quantum mechanics did not share this preference. As Wigner said in an interview 
[35] in 1963: 

I always felt also that this duality is not solved and in this I may have been under Johnny’s [John 
von Neumann] influence, who said, “Well, there are many things which do not commute and you can 
easily find three operators which do not commute.” I also was very much under the influence of the 
spin where you have three variables which are entirely, so to speak, symmetric within themselves 
and clearly show that it isn’t two that are complementary; and I still don’t feel that this duality is 
a terribly significant and striking property of the concepts. 

In this spirit, an uncertainty relation for triples of canonical operators was recently proposed and proved [17], 
and further generalizations are clearly possible. However, we will stick to angular momentum in this paper, and 
particularly seek to establish relations which do not break rotation invariance. 

Our paper responds to an increasing interest in quantitative uncertainty relations. This interest is connected 
to an increasing number of experiments reaching the uncertainty dominated regime, so that that rather than 
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qualitative or order-of-magnitude statements one is more interested in the precise location of the quantum limits. 
Measurement uncertainty was made rigorous in [3, 5, 27]. There is also a controversial [4, 6 ] state-dependent 
version [23]. That adequate uncertainty relations are sometimes better stated in terms of the sum of variances 
rather than their product has been noted repeatedly [11, 15, 21]. There has been some renewed interest also in 
the uncertainty between angular momentum and angular position [ 8 ], angular momentum of certain states [25] 
and other non-standard complementary pairs [19]. 


A. Setting and notation 

In physics angular momentum appears as orbital or as spin angular momentum. Our theory applies to both, 
but it must be noted that the bounds obtained do depend on the quantum number for L 2 . For example, there 
are states with vanishing orbital angular momentum uncertainties (precisely the rotation invariant ones, i.e., 
s = 0) but none for a s = \ degree of freedom. Therefore, one first has to decompose the given space into 
irreducible angular momentum components (integer or half integer), and then use the results for the appropriate 
s. Hence we will consider throughout a system of spin s, with s = 1/2, 1 ,3/2,... in its (2s + 1 (-dimensional 
Hilbert space H = C 2s+1 . The three angular momentum components will be denoted by Lk, k = 1,2,3, and the 
component along a unit vector e € M 3 by e-L. We denote by |m) the eigenvectors of L 3 , so that L^m) = m|m) 
where —s < m < s and, with L± = L\ ± 1 L 2 , 

L±\m) = \/s(s + 1) — m(m ± 1 )|m ± 1). (2) 

Rotation matrices, whether they are considered as elements of SO(3) or of SU(2), will typically be denoted by 
R, the corresponding matrix in the spin s representation by Ur, and normalized Haar measure on SO(3) or 
SU(2) by dR. We will always set h = 1. Observables are in general always allowed to be normalized positive 
operator valued measures (POVMs), with a typical letter F. For a self-adjoint operator A the spectral measure 
is an observable in this sense, denoted by E A ■ For a component of angular momentum, i.e., A = e-L we write 
E e for short. For the unit vectors e/ c along the axes we further abbreviate this to E *.. 

For the variance of the probability distribution obtained by measuring F on a state (density operator) p we 
write 


A 2 p {F) = min J (x — £) 2 tr pF(dx). (3) 

This minimum is taken over a quadratic expression in £, and it is attained when £ = f xtv pF(dx) is the mean 
value of the distribution. The most familiar case is that of the spectral measure for an operator A, in which 
case we abbreviate the variance by A 2 (A). Then the second moment f x 2 tr pF(dx) = A 2 can also be expressed 
by A and we get 


A 2 (A) = A 2 (E A ) = tr (pA 2 ) - tr (pA) 2 . (4) 

We say that a unit vector |</>) € R is a maximal weight vector, if for some direction e £ R 3 it satisfies 
e-L| (j>) = s|(/). This is the same as saying that, for some rotation R gSU(2), |</>) = Ur\s) up to a phase. 
For such a vector we call p = \4>){(t>\ a spin coherent state. These states are candidates for states of minimal 
uncertainty. 


B. Summary of Main Results 

We now describe the structure of our paper and the main results. 

Sect. II: Preparation uncertainty. — The basic object of study is the variance A 2 (e-L) of the angular momentum 
in direction e as a function of the unit vector e, especially properties which hold for an arbitrary state p. After 
clarifying some general features and explicitly solving the two cases s < 1 (Sect. IIB), we look at the traditional 
setting of just two components L\,L 2 . The set of uncertainty pairs (A 2 (Li), A 2 (L 2 )) is studied, and the fact 
that not both variances can be small is found to be well expressed by a iower bound not on the product but on 
the sum of the variances. We compute numerically (and exactly up to s = 3/2) the best constants in 

A 2 p (L 1 ) + A 2 p (L 3 )>c 2 (s) (5) 

and find that they asymptotically behave like C 2 (s) ~ s 2 / 3 (Sect. IID). For three components the uncertainty 
region is also studied in some detail. A prominent feature is again given by a linear bound [11] 

Ap(Lr) + A 2 (L 2 ) + A 2 (L 3 ) > s, 

which is very easy to prove (see Sect. II A, (17)). 


( 6 ) 
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Turning to features of the whole function e i—>• A^(e-L), we show in Sect. HE that for any p there is at least 
one direction e such that A^(e-L) > s/2, i.e., 


maxA^e-L) > |. (7) 

This bound is optimal, since it is saturated by spin coherent states. We generalize from the maximum (seen as 
the I/°°-norm) to all L p -norms (Prop. 1). 

For large s Eq. (6) suggests the scaling ~ 1/s (Sect. IIG). Indeed, the triples (A^(Li)/s, Ap(L 2 )/s, A 2 p (L 3 )/s) 
converge as s — > oo (Theorem 4). The lower bound on the limit set is obtained by a generalization of Robertson’s 
method for proving (1) Sect. IIF, which for finite s is (39) 

A^(£r) (A %L 2 ) + A2(L 3 )) > 1 (s(s + 1) - A %L{) - A %L 2 ) - A%L 3 )) , (8) 

where the components are ordered so that A 2 p (Li) > A 2 p {L 2 ) > A 2 (L%). The upper bound in Theorem 4 is 
provided by a family of states suggested by the Holstein-Primakoff approximation. 

Sect. Ill A: Vector model and moment problems. — We revisit the so-called vector model of angular momentum, 
a classical model which is still found in some textbooks. We show that it can correctly portray the moments up 
to second order (i.e., means and variances) of the angular momentum observables, but fails on higher moments 
and, of course, on correlations. 

Sect. IIIB: Entropic uncertainty relations. — We discuss entropic uncertainty relations only very briefly. We 
point out that the criteria “variance” and “entropy” may disagree on which of two distributions is “more sharply 
concentrated”. This effect is illustrated by the uncertainty diagrams for s = 1. We show also that the general 
Maassen-Uffink bound [20] while suboptimal for s = 1, becomes sharp for s -A oo, and determine a family of 
states saturating it. 

Sect. IV: Measurement uncertainty. — We consider two measures for the deviation of an approximate observable 
from an ideal reference, called metric error and calibration error. We then discuss uncertainty relations for the 
joint measurement of all angular momentum components. The output of such an observable is an angular 
momentum 3-vector r), from which one can obtain a measurement of the e-component (for any unit vector e) 
simply by taking e • 77 as the output. Such a marginal observable can in turn be compared with the quantum 
observable e-L. The uncertainty relation in this case gives a lower bound on the error in the worst case with 
respect to e. Our main result (Theorem 12) is a determination of the optimal bound, and an observable 
saturating it. It turns out that the optimal observable is covariant with respect to rotations, and this implies 
that it simultaneously minimizes the maximal metric error and the maximal calibration error. All the output 
vectors have the same length r m ; n (s), which depends in a non-trivial way on s but is close to s for large s. 

II. PREPARATION UNCERTAINTY 

In this section we consider the preparation uncertainty, i.e., a property of a given state p. For every unit 
vector e £ K 3 we can form the variance of the angular momentum component e-L, and hence study the function 

K e ) = Ap(e-L) =tr(p(e-Lf)-(trpe-L) (9) 

on the unit sphere. For the purposes of this section, this function summarizes all the uncertainty properties 
of the state p, and all results in this section are statements about properties of this function, which are valid 
for all p. To visualize the function v, we can use a three-dimensional radial plot, i.e., the surface containing 
all vectors v(e)e, as e runs over all unit vectors. A typical radial plot is shown in FIG. 1. Often we are also 
interested in the components with respect to some Cartesian reference frame. In this case the best visualization 
is an uncertainty diagram, which represents the possible pairs/triples etc. of variances in the same state. In our 
case this will be the set of pairs (i^ei), i>(e 2 )), or triples (i/ei), u(e 2 ), u(e 3 )). The diagrams for s = 1 are shown 
in FIG. 4. In this diagram it can be seen that the uncertainty region is not convex in general. Since we are 
only interested in lower bounds, we therefore always take the monotone closure of the uncertainty region, i.e., 
we also include with every point the whole quadrant/octant of points in which one or more of the coordinates 
increase. This is described in more detail in Sect. IIC. 

It turns out that after a rotation to suitable principal axes (which has already been carried out in FIG. 1), the 
function v depends only on three real parameters p\, /i 2 , p 3 . To see this we introduce the 3 x 3-matrix A = A (p) 
by 

v(e) = 'Yh e i e k A jfc (p) with (10) 

jk 

A-jk(p) — 3^6 XiY^pLj L]^j \j\k 

\j(p) = tr (pLj). 


( 11 ) 

( 12 ) 
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FIG. 1. The function v(e) from equation (10), where A is diagonal. 


Since the L transform as a vector operator (i.e., with respect to the spin-1 representation of SU(2)) we see 
that by choice of an appropriate coordinate basis in R 3 we can diagonalize A, i.e., we can choose A jk(p) = l^j^jk 
with pi > p 2 > P 3 > 0 . 

The eigenvalues (pi, p 2 , P 3 ) of any matrix A(p) are also a possible triple of variances, namely for a suitably 
rotated state. In fact, we can find the uncertainty triples for all rotated versions of p quite easily: When R is 
the rotation matrix taking the eigenbasis of A to the basis ei,e 2 ,e 3 under consideration, then 

v (ej) = J2tijkVk- (13) 

k 

Now the squared rotation matrix is doubly stochastic, so by BirkhofF’s theorem it is a convex combination of 
permutation matrices. We therefore find the variance triple in basis e in the convex hull of the six points, arising 
from the triple of pfc by permutation. These six points lie in a plane orthogonal to the vector (1,1,1), so they 
form a hexagon (see FIG. 2), which degenerates into a triangle if two of the p*, are equal. One can easily check 
that the full hexagon is attained by squared rotation matrices. 



FIG. 2. The orbit of a point under permutations of the coordinates, and its convex hull. 


A. Basic bounds 


For an L 3 -eigenstate p = |m)(m| we find A = (0, 0, m) and 


1 /i o °\ /1 0 o\ 

A(|m)(m|) = ~(s(s + l)-m 2 ) 0 1 0 > - 0 1 0 , 

1 \oooy ^yoooy 


(14) 
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FIG. 3. Pure state uncertainties for s — 1: Left panel: Parabola after (19) with its permutations. Adding to every point 
the hexagon generated by its permutation orbit generates the solid shown in the right panel. This is precisely the set of 
all variance triples for pure states. Its monotone closure is shown in FIG. 4. 


which re-inforces that, for eigenstates, the variances in all directions are smallest for the maximal weight m = s. 
Maximal variance is attained for an equal weight mixture p = l/2(|s)(s| + | — s)(—s|), which has L 3 -variance 
s 2 . Hence, for all e: 


0 < v(e) < s 2 . (15) 


The average w(e) over e with respect to the surface measure on the sphere, or equivalently the average of 
v(Re) over Haar-random rotations R, is readily computed from (10), since the average of over the unit 
sphere is just Sjk/ 3. Therefore, from (10), 

v ( e ) = ^trA(p) = ^(s(s + l) - |A| 2 ) > ^(s(s + 1) - s 2 ) > |. (16) 

In the same simple way we can get an inequality for the variances along the three coordinate directions of a 
Cartesian coordinate system: 


3 

^\(e fc ) = tr A(p) > 5 . (IT) 

k= 1 

In both cases equality holds precisely for |A| = s, i.e., if p is an eigenstate of one of the operators e-L for the 
maximum eigenvalue m = s. 


B. Special features for s = 1/2 and s = 1 

For s = 1/2, it happens that Lj and L *. (i.e., up to a factor the Pauli matrices) anticommute for different 
j, k , so that 


s = 


1 

2 : 


Ajfe(p) — Sjk Aj \k - 


(18) 


The eigenvalues are {pi,P2,P3) = 1/4(1,1,1 — 4|A| 2 ). Of course, pure states are characterized by |A| = The 
uncertainties are 1/4 — A 2 , and so the uncertainty region is described by a triangle. 

The case s = 1 is still special because the 3 + 6 operators L j. and ( LjL^ + L^Lj )/2 form a basis of the 
operators on C 3 . Therefore, p can be reconstructed from (A, A) and, in particular, the set of pure p can be 
characterized in terms of conditions on the eigenvalues pk and A&. In order to analyze these conditions, let 
us take the representation of the group SU(2) by real orthogonal matrices. Now consider a vector £ C 3 , 
which we can split into ip = '-Reip + i^smip with Reip,^smip £ R 3 . Note that the real continuous function 
1 1 —>■ (Re{e lt il?) , changes sign between t = 0 and t = n/2, so we can choose a complex phase for ip to 
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FIG. 4. The monotone closure of the uncertainty region of spin 1. Since this turns out to be convex, it is equal also to 
the lower convex hull (see text). Left panel: for two orthogonal spin components. The light gray area belongs to the 
monotone closure, but these points cannot be realized as uncertainty pairs. The parabolas outline the shape (compare 
also FIG. 3) the orange lines correspond to coherent states. Right panel: the analogue for three spin components. 
Projecting this body onto one coordinate plane gives the shape shown in the left panel. 


make IRetp and 3m ; ip orthogonal. Moreover, we can apply a rotation, so that Ifteip and ^mip are along the first 
two coordinate axes. Hence, up to a rotation, we have 

/cosA / ° \ (t 0 ° \ 

ip = I isint ] A = [ 0 I A = I 0 1 — r 0 I, (19) 

\ 0 / \2smtcostJ \0 0 1 — 4r(l — t)J 

where t £ [— 7 r, 7 r] or r = (sinf ) 2 £ [0,1]. In the three-component diagram the curve parameterized by r is a 
parabola lying in a diagonal plane. This parabola, and the two copies arising by coordinate permutation are 
shown in FIG. 3, as well as the body of uncertainty triples of all pure states, which arises by adding to each 
point on the parabola the hexagon formed by its permutation orbit. A paper cutout model of this solid is 
provided as a supplement [ 1 ]. 


C. General minimization method 

Consider now, a little more generally, any collection of hermitian operators A \,..., A n . We can then form, for 
any state p, the variance n-tuple (A 2 (Ai),..., A 2 (A„)), and ask which region fl in R n is filled, when p runs over 
the whole state space. We call Q the uncertainty region of the operator tuple. Typically, this is not a convex 
set, because A 2 (A) contains a term quadratic in p, which consequently does not respect convex mixtures. H 
will be simply connected (as a continuous image of the state space), but beyond that there are few general facts. 
It can happen that starting from a point in the uncertainty region we can leave the region by increasing one of 
the coordinates, i.e., the region encodes upper bounds on variances as well as lower bounds. This is clearly not 
relevant to the theme of uncertainty relations, where we ask for universal lower bounds only. We can therefore 
consider the monotone closure of the uncertainty region, by including all points with larger uncertainties, i.e., 

H+ = {(aq, ...,x n ) | 3p Va x a > A 2 (A Q )}. (20) 

This is still not necessarily a convex set. We will denote the convex hull of by H v and call it the lower 
convex hull of H (see FIG. 4). It is this set which has an efficient characterization. Indeed, as a closed convex set 
it is the intersection of all half spaces containing it, and the monotonicity condition restricts these half spaces 
to those whose normal vector w has all components non-negative. In other words, 

fl v = {(aq,... ,x n ) | \/w a > 0 : ^]w Q a; a > m(wi,...,w„)}, ( 21 ) 

a 

m(wi,... ,w n ) = inf ^ w a A 2 p (A a ) 

a 

= inf inf w a tr p(A a — a Q l) 2 

a. 


(22) 
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In this double infimum we can exchange the order, leading to two kinds of operations: With fixed p the global 
minimum over the a a is obviously at a a = tr pA a . On the other hand, with fixed a a the global minimum over 
p is computed by finding the ground state of the positive semidefinite operator H(a) = w a ( A a ~a a 1) . 

An efficient algorithm is therefore obtained by alternating between these two steps. The upper estimates on m 
obtained in this way are non-increasing and in practice converge quite well, and independently of the starting 
value. However, we do not have a theorem to this effect. An analytic consequence of this algorithm (independent 
of convergence) is that we can restrict the infimum to pure states, since this is sufficient to get the ground state 
energies. 

The algorithm is then run for a suitable set of tuples (uq,..., w n ), so that for each run, one obtains a tangent 
plane to H v but also the state p and with it, the tuple of variances in Cl. We illustrate the results in FIG. 4 for the 
case of spin 1, and the operator tuples (Zq,I/ 2 ) and (Li,L 2 , A 3 ), respectively. For low spin these diagrams can 
be determined analytically (see the next subsection). The most prominent feature of two-component diagram 
is the symmetric linear bound, which depends on s and is determined in subsection IID. 


D. The linear two-component bound 

For every s, let C 2 (s) be the best constant in the inequality 

A 2 (Li) + A 2 p (L 3 )>c 2 (s). (23) 

For s = 1/2,1 it is readily computed from the eigenvalues of A given in Sect. IIB. For arbitrary s we can use a 
slightly simplified version of the variational principle (21). We have w 1 = w 3 = 1, and can assume that <zi = 0 
by rotation invariance around the 2-axis. Thus 

02 ( 5 ) = inf inf(</>| s(s + 1)1 — L\ — 2 aL 3 + a 2 l I </>)>, (24) 

(j> a ' 1 ' 

where the first infimum runs over all pure states (for fixed a a ground state problem) and a over the reals (for 
fixed cf> the expectation value of L 3 ). One notes that in this operator only matrix elements with even m — m! 
are non-zero, so the problem can be further reduced. For up to s = 3/2 it effectively leads to two-dimensional 
ground state problems. In this way (resp. by using the results of Sect. IIB) we get 

<*(\) = \ 

C 2(l) = Jq 

c 2 (3/2) = ^ + y 2 - \J 4q 2 + 2q + 1 « 0.600933 
where 7 = cos( 7 r/ 9 ). 

Note that the bound 02 ( 1 ) was already obtained in [11], It is readily seen numerically that C 2 (s) increases with 
s, but sub-linearly. This means that if we scale the diagram of CF (see FIG. 4, right) by a factor 1/s so that the 
bottom triangle described by (17) stays fixed, the two-component inequality excludes an asymptotically small 
prism around the axes. FIG. 5 shows the asymptotic behavior of C 2 in a log-log plot, which suggests that 

c 2 (s) « 0.569524 s 2 ^ 3 for large s. (28) 


(25) 

(26) 

(27) 


E. Power mean and maximal uncertainty 

A natural way to characterize states with small variance is to look for the maximum of the variance function 
v(e) defined in (10). An uncertainty relation would then put a lower bound c(s) on this maximum. In other 
words we would like to prove the following statement: For every state p there is some direction e such that 
A 2 p (E e ) is larger than c. By considering coherent states we can immediately see that c(s) < s/2. The following 
proposition shows that coherent states in fact have minimal variance in terms of this criterion, and we even 
have equality. 

Such a result can be seen as one end of a one-parameter family of criteria, of which (16) is the other end: We 
can judge the “size” of the function v by its £ p norm, of which the maximum is the special case p = 00 , and the 
mean the case p = 1. We therefore formulate a proposition to cover all these cases. 

Proposition 1. For every s G N/2 and every p £ [1, 00 ] there is a constant c(p, s) such that, for every density 
operator p in the spin s representation, 

1 

INI P =(J J A£(e.L)) P >c(p,s) 


(29) 
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Q(s) 



c 2 (s) 



FIG. 5. Left: Log-log plot of the numerical calculations of the two component bound C 2 (s) in black and the arising fit 
(28) in blue. Right: Numerics and the fit for small s. 


with equality whenever p is a spin coherent state. For p < oo these are the only states with equality. For 
p = oo equality holds also for mixtures p = p+| +s)(+s| + p~\ — s)(—s|, and rotations thereof, provided that 
P+P- > l/( 8 s). 

The constant is 


c( P ,s) = £ f 

2 l 2T(p+l) 


m 


with special values c(l, s) = s/3, c(2, s) = syj 8/15 ~ 0.73s, c(oo, s) = s/2. 


Proof. Let A j = tr pLj be the vector of expectation values, and consider the set of density operators p^ arising 
from p by rotation Rp around the vector A by the angle /3. For each pP , we call the variance function vp(e) = 
v(Rpe). By averaging over f3 we find a state p, with variance function 


;( e ) = ^ J d P M e ), 


(31) 


where we used, crucially, that all p& and p have the same expectations A j. By the triangle inequality for the 
p-norm, we have ||u|| p < ||v|| p . Hence we can restrict the search for the p with minimal ||u|| p to those which are 
rotation invariant around some axis, say the 3-axis. 

Such a state can be jointly diagonalized with L 3 , and is hence of the form p = Then 

A = (0,0, ^ ra p m m), and A(p) is diagonal with 


An = A 22 = i(s(s + l) —^p m rn 2 ) > ^(s 2 + s-s 2 ) = | 

m 

A33 = - (^p m m) 2 > 0 

m m 

v(e ) = (e 2 + e 2 )An + e 2 A 33 . 


(32) 

(33) 

(34) 


The last equation shows that the function v becomes pointwise smaller (and hence smaller in p-norm) if we 
decrease some An. That is, we have to go to the minimum on both An and A 33 . The minimum in (32) is 
attained precisely when p m 0 only for m = ±s. Then minimality in (32) forces p to be a spin coherent state. 
For p = 00 the norm only sees the maximum, so the pointwise minimum need not be chosen, and we may allow 
0 < A 33 < An without changing the maximum. The latter inequality translates to the one given in Prop. 1. 

The concrete constants follow easily by integrating the p th power of (34) with An = s/2 and A 33 = 0 with 
respect to the normalized surface measure on the sphere, i.e., 


c(p, s ) := - 


1 

sin 6 dd . 2 \ p 

(sm ey p 


(35) 


□ 
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F. Robertson’s technique: a generalization 


We have criticized the Robertson inequality (1) for not giving a state independent bound. However, with 
only little effort it can be used to derive such a bound. Indeed, abbreviating v 3 = A 2 p (Lj), and A j = tr pLj we 
can add the three inequalities of the form v\v 2 > A§/4 and use that Ylj ( v 3 + A 2 ) = s ( s + 1) to obtain 

v\v 2 + v 2 v 3 + v 3 v i > ^ (s(s + 1) - (iq + v 2 + v 3 )). (36) 


Clearly, this no longer allows v\ = v 2 = 0, since v 3 < s 2 . The set of variance triples satisfying this is shown 
in FIG. 6. Comparison with FIG.4 readily shows that this bound is not optimal. However, we can generalize 
Robertson’s technique from two to three components rather than extend his two component result in this trivial 
way. The basis of the technique is to utilize the observation that for any finite collection of operators Xj (not 
necessarily hermitian or normal) the matrix m 3 k = tr pX*X^ is positive definite, which is the same as saying 
that for any complex linear combination X = y~b a 3 X 3 the expectation of X*X must be positive. In order to 
get Robertson’s inequality for Li,L 2 this idea is applied to the three operators 1,Li,L 2 . In fact, this leads 
to Schrodinger’s improvement of the inequality [28] which also contains the square of the covariance matrix 
element Ai 2 (p) 2 on the right hand side. 

We will apply the method to the four operators 1, L 3 , L 2 , L 3 . In order to simplify the expressions, however, 
we will not look for variances and the off-diagonal elements of A (p), but for inequalities involving the eigenvalues 
fij. As discussed at the beginning of this section, this will contain all the information needed. In other words, 
we will take the matrix A (p) as the diagonal matrix with entries ji-\ > /i 2 > /z 2 . The matrix which then needs 
to be positive is 


( 1 

Ai 

X 2 

A3 \ 

Ai 

/Zi + A 2 

A1A2 + iX 3 /2 

A1A3 — iX 2 /2 

A2 

A1A2 — ZA3/2 

/z 2 + A 2 

A2A3 + iXi/2 

V A3 

A1A3 + i\ 2 /2 

A2A3 — iXi/2 

M 3 + A§ / 


The positivity of this matrix is equivalent (see e.g. [14, Thm. 7.2.5]) to the positivity of the principal minors, 
i.e., the determinants of the submatrices of the first k rows and columns for k = 1, 2, 3,4. The first three of these 
are 1, /zi, and /Z 1 /X 2 — A|/4. The positivity of the third one is Robertson’s inequality (1). The only remaining 
condition for the positivity of M is det M > 0, which evaluates to 

M 1 M 2 M 3 — ^ (A]>i + A 2 M 2 + A 3 /Z 3 ) > 0. (38) 

This will be combined with the normalization condition 

Af + A 2 + A 2 = s(s + 1) — (m + /i 2 + /Z 3 ). (39) 

The condition on the triples (/Ji, /J 2 , M. 3 ) we have to evaluate is the existence of Aj satisfying both these relations. 
Since only the squares enter, let us set Xj = A 2 . Then (39) describes a triangle in the positive quadrant with 
equal intercept s(s + 1) — (/zi + /z 2 + /z 3 ) with the axes. The inequality (38) describes a tetrahedron spanned 
by the origin and the axis intercepts x ° = 4 /Z 2 P- 3 , an d cyclic. Note that Robertson’s inequality is automatically 
satisfied on this tetrahedron. Obviously the tetrahedron and the triangle intersect if an only if one of the axis 
intercepts of the tetrahedron reaches or lies above the triangle. Since we can take the eigenvalues ordered: 
/^i > ^Z 2 > /z 3 this means 


4/Z1/Z2 > s(s + l)-(/Z1 + /Z2+/Z3). ( 40 ) 

This is a bound to the eigenvalue of the A-matrix. By Birkhoffs theorem, the variances arising from such A also 
includes all convex combinations of permutations of the /z^ (see beginning of Sect. II). In order to characterize 
the set of variance triples generated in this way we need the following Lemma. In its formulation the variables 
er £ S 3 run over the permutation group on three elements, and are applied to the components of a 3-vector (see 
also FIG. 2). 
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FIG. 6 . Left: Region bounded by the inequality (36). Right: Plane of constant 7 orthogonal to the (1,1,1) direction. 


Lemma 2. With the notation from above, the following sets I\\ and K 2 are equal. 

K\ = |J 7 / 1 ( 7 ), where (41) 

7>s 

H\(j) = conv |J a(hi(i)) , with 
\^S 3 ) 

hi(l) = {((*1,112,1*3) I E; hi = 1, hi > (*2 > (*3 > 0, 4 /ii/x 2 > s(s + 1) - 7} 

and 

K 2 = U 7 / 2 ( 7 ), where (42) 

7>s 

7 / 2 ( 7 ) = U c(/i 2 ( 7 )), with 

cr£S 3 

1 * 2 ( 1 ) = {(vi,V 2 ,v 3 ) I Ei v i = 7> vi>v 2 >v 3 > 0, 4iq (v 2 + v 3 ) > s(s + 1) - 7 }. 

Proof. For the equality of Ki and K 2 it is sufficient to show that Hi and H 2 coincide for every 7 . The restriction 
Ej v i = Ei hi = 1, together with the 3-fold symmetry of the problem, tells us that 7 / 1 ( 7 ) and H 2 (i) are subsets 
of the triangle, whose corners lie on the axes at a distance 7 from the origin. In this triangle the ordering of 
the Vi and /q reduces hi and h 2 to the dashed subset marked in FIG. 6 . 

Now the first and last condition in the definition of h 2 can be combined to obtain 


so we get 


4 ^ 1(7 — Vi) > s(s + 1 ) — 7 


n ^ 2 s(s + 1 ) - 7 

0 > v, — ivi - 

4 





s(s + 1 ) — 7 
4 


e(l). 


(43) 

(44) 


(45) 


Because vi > 0 we have to choose the positive sign, which means that H 2 ( 7 ) is the intersection of the triangle 
with the three halfspaces v. t < c(q), whose boundaries are marked as a orange lines in FIG. 6 , i.e., 


H 2 (i) = {(vi,v 2 ,v 3 ) | 'Y^,v i = i,Vi< c(i )}. 

i 


(46) 


H 2 ( 7) is clearly a convex polytope. The extremal points of H 2 ( i ) have to saturate at two of the defining 
inequalities. In the ordered triangle (vi > v 2 > v 3 ) the only extreme point is given by p(i) := (c( 7),7 — 
c ( i ), 0), and all others can be obtained by permutations. Hence H 2 can be described as the hexagon H 2 ( i ) = 
conv ((J a(p(i))). On the one hand, by comparing the defining inequalities for hi and h 2 , we can see that 
every triple fii £ hi is also part of h 2 . So including the permutations and by the fact that H 2 ( 7 ) is convex, we 
get #1(7) C H 2 ( 7). 

On the other hand, the 3-component of p is zero, so it is also part of hi £ Hi( 7 ). While the point p and its 
permutations are the extremal points of H - 2 ( 7) and 7/1(7) i s convex, we have 7/2(7) Q 7/1(7). □ 
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FIG. 7. Region bounded by the generalized Robertson inequality. Left: Hyperbolic curves on the faces for the eigenvalues. 
Right: Uncertainty region given by the variances and the base triangle formed by the spin coherent states. 


Therefore we get the following statement: 

Proposition 3. 

Let vi > v 2 > v 3 > 0 be variances of the angular momentum components, then the following holds: 

4vi(v2 + v 3 ) > s(s + 1) - (V! + v 2 + v 3 ). (47) 

As one can see in FIG. 6, the boundaries of the corresponding uncertainty region on the coordinate planes are 
given by permutations of the hyperbolic curve Aviv 2 = s(s + l) — V\ — v 2 . This uncertainty region is monotonously 
closed and given by the convex hull of the above hyperbolic curves. This is shown in FIG. 7. 


G. Asymptotic Case 


Now we take a look at the behavior of the asymptotic uncertainty region for s —> oo. We already know that 
A 2 p (Li) + A^(L 2 ) + A^(L 3 ) > s and hence it is appropriate to scale the problem by 1/s, which will fix the sum of 
the variance in the lower base triangle to 1. We start with the asymptotic behavior of the generalized Robertson 
inequality derived in the previous section. On the scale of 1/s, i.e. Vi/s = Vi , and the ordering v\ > v 2 > v 3 
this inequality reads 


4iq(i/ 2 + v 3 ) > 1 + -(1 - [vi + v 2 + v 3 )) 
s 


(48) 


and as s goes to infinity the set of possible variances shrink to 


4^i(i/ 2 + v 3 ) > 1, 


(49) 


because ]T/ z/j > 1. Hence the inequality (48) gets stronger for increasing s. 

In this section we will show that this bound is attained by states, which will be constructed in the following 
way. Using the technique described in part IIC, we look for the states if, which minimize the expectation of 
the operator 


H(s, w) = - Ai) 2 +w 2 (L 2 - A 2 ) 2 +w 3 (L 3 - A 3 ) 2 ), 

for a normal vector w. We do this using the Holstein Primakoff transformation [12]: 

L + = y/2s\ 1- a L_ = V2sa* \ 1- L 3 = s — a*a. 

V 2s V 2s 


(50) 


(51) 


Here a and a* are the creation and annihilation operators, so we have a representation of the angular momentum 
algebra in the oscillator basis. For large s and appropriate states, this transformation can be reduced to 

L + = \/2sa + C?(s _I ) L_ = \/2sa* + 0(s~^) L 3 = s — n. 


(52) 
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Notice that in the Holstein Primakoff basis, the spin coherent state |s) is transformed to the ground state |0)pp, 
hence the state \u)hp corresponds to |s — n) in the standard angular momentum L 3 eigenbasis. Now we rewrite 
H using the above transformation and the relation for position L\ = \[L + + iL_) = a + a*) = y/sX and 
momentum L 2 = \(L + — iL_) = ^/{a — a*) = y/sP. We arrive at 

H(s,w) hp = (wy(X - £) 2 + w 2 (P - rj ) 2 + - a*a - C) 2 ) PO(s~^). (53) 

Here £, 77 and ( denote the transformed expectation values. From IIA we know that |s) has minimal uncertainty 
for w ~ (1,1,1) and arbitrary s. Based on this observation we make the assumption that we are close to the L 3 
spin coherent state. We thus have s (a* a) and A 3 « s, hence f is linear in s. Furthermore we can order the 
weights, such that W\ < w 2 < W 3 to minimize the expectation value. Now we take the limit and let s becomes 
large, the operator converges to harmonic oscillator 


H(w)hp = W\X 2 + w 2 P 2 . 


(54) 


Here we use that the expectation value of the harmonic oscillator is translation-invariant in phase space, so 
that we can choose £ and 77 to be zero. The state which minimizes the expectation of this operator is simply 
the harmonic oscillator ground state ip(m,u>), with to = and u = y/4wyw 2 . In the following these will be 

combined in the parameter a := mu = F° r th e comparison of this result with numerical calculations using 

the above described algorithm, we must express these ground states in a common basis |n)pp, i.e. decomposing 
ip (a) in the basis of a harmonic oscillator with a = 1. This transformation is given by 


tp n ■= (n\ip(a)) = 


(a)i 


H n (x )exp - 


(1 + a ) ji 


yJ‘l n 'Kti\ 

which is zero for odd n and can be solved for even n through 

i 3n y/nn\(c — 1)5 


dx, 


J H n (x) exp(— cx 2 )dx = 


ni 


(55) 


(56) 


!C 2 


The corresponding probability distribution is given by 


P n : — IV^I 


y/a (1 — a) n n\ 

1 + a (2 + 2a)« (f!)2 


(1 + (-!)")• 


Because this is zero for odd n, we can set n = 2k and get 


2 y[a (1 — a) 2k / 2 k\ 

1 + a (2 + 2 a) 2k \ k ) 


(57) 


(58) 


The above approximation does not necessarily yield the optimal states and it is not rigorously justified so far. 
As a first step, we compare the distribution p n with numerically determined ones for finite s. These tend to 
converge as shown in FIG. 8 . 

Theorem 4. The lower bound of the asymptotic uncertainty region on a scale of 1/s is fully described by the 
generalized Robertson inequality (f9) and is saturated by the states ip(a). 

Proof. First we will show that the approximation (52) is justified for ip (a) and evaluate the corresponding 
asymptotic variances. While the generalized Robertson inequality gets stronger for increasing s, every extremal 
point of the corresponding boundary is attained by ip{a), which will prove the above statement. Moreover, 
by truncating the sequence ip n {ot) at n = 2 s + 1 and renormalizing, we get a sequence of spin-s states well 
approximating ip (a) as s goes to infinity. 

With this in mind, we will prove the above statement in two steps: 

(i) On the one hand we have to verify that lim /=L + \ip(a)} = y/2a\ip(a)) and lim -j=L-\ip(a)) = y/2a*\ip{a)) 

which is true if ip (a) is in the domain of a*a. On the other hand we have to show that the term ^ s -(s — a*a — () 2 
from (53) will vanish for ip(a) with f = (ip(a)\s — a*a\ip(ct)) as s goes to infinity. Both requirements are fulfilled 
if the moments (ip(a)\a*a\ip(a)} and (ip(a)\(a*a) 2 \ip(a)) are finite. 

In the Holstein-Primakoff occupation basis \n)uPi these moments are given by series of the form 


n °P'n = '%2( 2 k) c p 2k , 

n—0 k —0 


( 59 ) 
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FIG. 8. Comparison of the occupation number from with the numerical calculations. 


and can be computed as derivatives using the generating function 


Cl 

\Jl — 4a: 



of the probability distribution p 2 k (58). By straightforward calculations we get 


(4>(a)\(a*a) 2 \ip{a)} 


(1 ~ a) 2 

4a 

3(1 — a) 4 (1 — a) 2 

16a 2 P 2a 


(60) 


(61) 

(62) 


which is finite for a > 0. 

(ii) Now the asymptotic variances for i/ J ( a ) can be determined. For r/>(a) the operators ~^L 2 and 

converge to P , Q and a multiple of the identity, we obtain 

= A l (a) (Q) = ^ (63) 

A“a A ^) (L2) = A ^) (P) = f (64) 

] A l(a)( L 3) = ^(a)(* - «*«) = 0- (65) 


This set of variance triples (A-, f ,0) saturates the asymptotic generalized Robertson bound (49). Moreover 
they describe the extremal boundary curves, see the proof of theorem 2, of the associated uncertainty region. □ 


III. PREPARATION UNCERTAINTY: SPECIAL TOPICS 
A. The vector model and moment problems 

This may be a good place to comment on the so-called vector model of angular momentum, as it was suggested 
by Old Quantum Theory. It still seems to be quite popular in teaching, although theoreticians tend to deride 
it as ridiculously classical and obviously inconsistent. Indeed, its two-particle version gives manifestly false 
predictions even for spin-1/2, as witnessed by Bell’s (CHSH) inequality. Since any local classical model fails 
this test, not much can be learned about angular momentum from this observation. Therefore we consider here 
only the one-particle version, and try to sort out how far it can be trusted. 

The basic rationale of the vector model is shown in FIG. 9: Angular momentum is thought of as a classical 
random variable taking values on a sphere of radius r s = \Js(s + 1). For an eigenstate | m) the corresponding 
classical distribution is supposed to be concentrated at latitude m, and uniform with respect to rotations around 
the 3-axis. The expectation value of this distribution is (0,0, m). Moreover, its matrix of second moments is 
also diagonal, since the coordinate axes are clearly the inertial axes of a mass uniformly distributed on a circle 
of fixed latitude. One readily checks that all second moments are the same as for the corresponding quantum 
state. This can be generalized: 
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FIG. 9. The well known vector model of anguluar momentum for s = 2 


Proposition 5. For any quantum state p on TL there is a classical probability distribution p, on the sphere of 
radius sjs(s + 1) which has the same first and second moments as the angular momentum in p, i.e., 

m, = /„(<**)** "trpi, and M,, = = SatrpLfy. (66) 

For the proof we only need a characterization of the moments (to.,-, Mjk) of probability measures on a sphere 
of radius r, which turns out to be quite simple. This in turn provides an immediate proof of Proposition 5, 
since the quantum moments Mjk = 5ie tr pLjLk, rrij = tr pLj obviously have the required properties with radius 
r 2 = s(s + 1 ). 

Lemma 6. Let ( nij) £ R for j = 1,2,3, and (Mjk)? fe=1 a real symmetric 3 x 3-matrix. Then these numbers 
are the first and second moments of a probability distribution on the sphere of radius r if and only if tr M = r 2 
and the covariance matrix Mjk — mjmk is positive semi-definite. 

Proof. Necessity is obvious because covariance matrices are always positive and the function v H > J ~) ■ v 2 = r 2 is 
constant on the sphere. For the converse consider the set K of moments (m, M) of probability measures on the 
sphere. This is a compact convex set, which we can think of as embedded into 3 + 6 — 1 dimensional real space, 
because the real symmetric matrix M is specified by 6 parameters, and we have an additional linear constraint 
tr M = r 2 . By the separation theorems for compact convex sets the set K is therefore completely characterized 
by a collection of affine inequalities 

/(to, M) = tr AM — b ■ to + 7 > 0, (67) 

where A is real symmetric, b £ R 3 with the dot indicating scalar product, and 7 £ R. The functionals for 
which these inequalities have to be satisfied are precisely those for which the above inequality holds for all pure 
probability measures, i.e., for Mjk = VjVk mj = v :j for some v £ R 3 on the sphere. In this case we slightly abuse 
notation and write /(to, M) = f(v). 

Not all inequalities are needed to characterize K, but only the extremal ones, which furnish a minimal subset 
from which all the others follow as linear combinations with positive scalar factors. In particular, we can assume 
that / is not strictly positive, so has a zero f(u) = 0, which then also has to be a minimum. The extremality 
condition gives 2 Au — b= 2A u, where A £ R is a Lagrange multiplier. This determines b , and from f(u) = 0 we 
get 7 , so that we can rewrite 

f(v) = (v — u) ■ A(v — u) + 2A u ■ (v — u). ( 68 ) 

Now since u, v lie on a sphere of radius r, we can write 2 u ■ (v — u) = — (v — u) 2 + (v 2 — u 2 ) = — (v — u) 2 , so we 
can combine the two terms, and obtain again the form ( 68 ), with A modified by a multiple of the identity, and 
A = 0. 

It remains to determine all real symmetric matrices A such that (v — u) ■ A(v — u) > 0, whenever u, v lie on a 
sphere of radius r. Equivalently, £ • A / > 0 for all multiples of vectors of the form v — u. But this set is dense in 
R 3 . Hence the desired condition is just the positive semi-definiteness of A. The resulting inequality for (to, M) 
can be rewritten in terms of the covariance matrix Vjk = Mjk — nijirik as 

/(to, M) = tr VA + (to — u) ■ A(m — u) > 0. (69) 

Since the second term is anyhow positive, the positive semidefiniteness of V is sufficient for all these inequalities. 
This shows the sufficiency of the conditions stated in the Lemma. □ 
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Let us make some remarks, which all fit into a fruitful analogy here with the phase space case, i.e., the case 
of two canonical operators P, Q, and moment problems posed in the respective contexts. 

1. The phase space analogue of Prop. 5 is the statement that for any quantum state the first and second 
moments can also be realized by a classical probability distribution on phase space. Of course, not all 
classically allowed first and second moments can arise in this way: This is just the theme of preparation 
uncertainty relations. 

2. The classical probability measure p is not uniquely defined by p. For example, the density operator 
p = ( 2 s + 1) _1 1 can either be represented by the uniform distribution on the sphere, or by an equal- 
weight mixture of the distributions with constant latitude m (in any direction). In the phase space case 
it is well-known that with the given, quantum-realizable moments one can always find a Gaussian state, 
which is defined as the distribution with the maximal entropy given those moments. The same idea also 
works for angular momentum, and it gives probability densities which are the exponential of a quadratic 
form in the variables. In contrast to the phase space case, when approaching eigenstates (any direction, 
any m) this entropy will go to — oo, since for eigenstates only the singular measures depicted in FIG. 9 
can be used. 

3. Prop. 5 is certainly false if we include higher than second moments. For example, consider a pure qubit 
state with m = +1/2. Without loss of generality we can choose the measure p invariant under rotations 
around the 3-axis. Since p must be concentrated on m = 1/2, this uniquely fixes the measure p, and 
hence the moments to all orders. Now consider a direction e which is at an angle strictly between 0 and 
7 t/ 2 to e 3 . Then the quantum expectation of (e-L ) 3 = e-L/4 is (e-e 3 )/ 8 , but the classical expectation of 
(a;-e ) 3 is larger, reflecting the non-linearity of the cube function. 

4. The quantum analogue of the classical Hamburger moment would be to reconstruct a quantum state from 
the set of moments, i.e., the expectations of the monomials in the basic operators ( P,Q , or Li, L 2 , L 3 ). 
Commutation relations impose some constraints on these moments, so that in the end only monomials like 
Lf' Li) 2 L/ 3 need to be considered. Of course, the expectation values of such operators will generally be 
complex numbers. Can we do the reconstruction for arbitrary states in the spin-s representation? Indeed, 
we can, and it is actually much easier than in the phase space case since only finitely many moments 
suffice. The basic observation is that the moments fix all expectations on the von Neumann algebra A 
generated by the Li. Because the representation is irreducible, the commutant of this algebra consists 
of the multiples of the identity. Hence A must be the full matrix algebra, and the state is uniquely 
determined. That finitely moments suffice is clear because dim .4 < oo. 

5. Noncommutative moment problems are plagued by “operator ordering” issues. But in some sense we have 
already adopted a standard “symmetrized” solution for operator ordering, namely to form moments only 
of the operators e-L for all fixed e. This is analogous in the phase space case to considering the moments 
of linear combinations of P and Q. Now, famously, the full distributions of all such combinations are 
correctly rendered by the Wigner distribution function, which is itself hardly ever positive [16]. The 
analogy to the angular momentum case is immediate. So what do we get if we accept “quasi-probability 
distributions”? Can every state be represented like that? This is answered by the following proposition. 

Proposition 7. Let p be a quantum state in the spin-s representation. Then there is a unique tempered 
distribution W p on R 3 , which is formally real, has support in a ball of radius s and satisfies, for all e and n £ N 

J dr] W p (t]) (e • r/) n = tr(p(e-L) n ) . (70) 

Proof. We can compute the Fourier transform of W p directly from (70), by multiplying with ( ik) n /n\ and 
summing over n. This turns the left side into the Fourier integral over W p allowing the sum to be evaluated 
also on the right hand side: 


W p (k) = J d V W p ( 77 ) e <k ”» = tr(pe <k ‘ L }* 


(71) 


where k = ke. Strictly speaking this computation should be regularized by multiplying with an arbitrary test 
function before summation, but this would lead to the same explicit representation of the Fourier transform W p 
as a bounded C°°-function. This shows that the desired tempered distribution W p is essentially unique, and 

can be defined for every p. It is formally real, because W p (—k) = W p (k). For the claim about the support we 
invoke the distributional version of the Paley-Wiener-Schwartz Theorem [13, Thm. 7.3.1]. According to that 
theorem we need to show only that for real vectors k, k the estimate 


tr(pe i(k+iK) ' L ) 


< Ce s|K| 


(72) 
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holds for some constant C. Treating the sum in the exponential by the Trotter formula, which we may, because 
these are finite dimensional matrices, we get 


3 i(k+i«)-L 


= lim 

N—> oo 


( \ n / \ n 

e i(k/n)-L e (-«/n)-Lj < ^ e («/n)||L|| j _ e s|«| # 


(73) 


Clearly this implies the desired estimate. □ 

This Proposition is remarkable in comparison to Prop. 5: If we insist on positivity but require only the first 
two moments to be correct, the vector model requires a sphere of radius \Js(s + 1). On the other hand, if we 
waive instead the positivity of the classical distribution, we are forced to use a ball of radius s. 

One might ask at this point, in continuation of the above analogies to the phase space case, whether there 
are “classical” states, for which the Wigner function W p is a positive function. However, this is easily seen to 
be impossible. Indeed, when the marginal of a proper probability distribution has support on {— s ,..., s}, the 
measure itself has to have support on a union of hyperplanes { 77 1 r) ■ e = m}. But these families of hyperplanes, 
drawn for various e have empty intersection, contradicting the normalization of the measure. 

The main use of Wigner functions on phase space is the visualization of quantum states. Unfortunately, the 
much more singular nature of W p for angular momentum will prevent these Wigner functions from becoming 
similarly popular. This irregularity can be tamed by replacing on the right hand side of (71) 

e 'ik • L ^ gifciLi e ik 2 L 2 e ik 3 L 3 ( 74 ) 

The corresponding distribution is then a sum of point measures sitting on a finite cubical grid [7]. This may 
actually be useful in quantum information, where it relates to a discrete phase space structure over the cyclic 
group of d = 2s + 1 elements. However, for angular momentum proper we find this breaking of rotational 
symmetry abhorrent. 


B. Entropic uncertainty 

In this section we will have a look at the entropic uncertainty relations. Given a measurement of a hermitian 
operator A = )T7 aiPi, with eigenprojectors Pi, the probability of obtaining the i th measurement outcome will 
be denoted by iVi(p) = tr(pP,) and the associated probability distribution as n(p) = {tti (p),--- ,7 r<j(p)}. Then 
the output entropy of A in the state p is defined as the Shannon entropy of n(p) 

d 

H(A,p) := H(n(p)) = - ^ Tr t (p) log^Tr^p)), (75) 

i —1 

which serves as an uncertainty measure. Note that we normalize the Shannon entropy by its maximal value 
logd so that all occurring entropies are bounded by 1. In contrast to the variance, the entropy of a probability 
distribution does not change by permuting or rescaling the measurement outcomes and so only depends on the 
choice of the P t (up to permutations) and not on the eigenvalues a^. This implies that an entropic uncertainty 
relation, which constrains the output entropies of two (for simplicity non-degenerate) observables A, B, only 
depends on the unitary operator U connecting the respective eigenbases. A well-known bound in this setting is 
the general Maassen-Uffink bound [20] 

H(A, p) + H(B, p) > — 21og d (c) 

c = c(U) = max \ Uij\. 

ij 

For angular momentum measurements in two orthogonal directions the connecting unitary operators are rota¬ 
tions, i.e., given as a rotation by tt/2 around the third coordinate axis according to the spin-s representation of 
SO(3) on C 2s+1 . For arbitrary angles these representations are called Wigner-D matrices [2] and will also be 
used in Sect. IVE. It turns out that the Maassen-Uffink bound is in general not optimal, but describes precisely 
the uncertainty region for s —> 00. 

For spin s = 1, the uncertainty region can still be reliably investigated by parameterizing the set of pure 
states in the L 3 eigenbasis. Numerics suggests that real valued states and their permutations characterize the 
lower bound of the uncertainty region. The resulting uncertainty regions for two and three components are 
shown in FIG. 10, which should be compared directly to FIG. 4(left) and FIG. 3(right). 

The marked lines in FIG. 10 correspond to states of a form | (f>(t)) := ; sin(t), ) 1 written in L 3 - 

eigenbasis, and their permutations. They correspond exactly to the extremal curves found for variance uncer¬ 
tainty as shown in FIG. 3. Remarkably, however, the ordering of “uncertainties” turns out to be different in the 
two cases. Consider the L 2 -eigenstates, which are shown in the left panels of both 2D Figures 4 and 10 as the 
points on the horizontal axis. The respective Ti-probability distribution are (1/4,1/2,1/4) for m = ±1 and 


(76) 

(77) 
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FIG. 10. Entropic uncertainty regions for measurements of two and three orthogonal spin components (s = 1). The 
orange line in the left panel is the Maassen-Uffink bound (76), which is tight only in the case s = 1. 


(1/2, 0,1/2) for m = 0. The former distribution has the larger entropy ((3/2)log 3 2 > log 3 2) but the smaller 
variance (1/2 < 1). 

For larger s this inversion no longer holds. FIG. 11 shows this effect. Because we can exchange the roles of L\ 
and L 2 by a unitary rotation, the uncertainty diagrams are symmetric with respect to the diagonal. Therefore 
the optimal linear bound must be of the form (76), with a suitable c. The entropy sums for the eigenstates 
|to) with minimal and maximal \m\ are shown in FIG. 11. For all half-integer s and for integer s > 7 the 
coherent state |s) produces not only the lowest variance, but also the lowest entropy. FIG. 11 also shows the 
Maassen-Uffink bound, which has been computed by Sanchez-Ruiz [30]. It is attained for the overlap of two 
spin coherent states and is given by 


c 2 = 2 -2s 


2s \ 

[* + 1 / 2 ]/ 


(78) 


s = 1 seems to be the only case in which the bound is tight. 

However, for large s, the bound is again optimal, as the following result shows. 

Proposition 8. In the limit s —> 00 the optimal lower bound on the entropic uncertainty region of L\ and L 2 
is given by the Maassen-Uffink inequality, which converges to 

H(L 1 ,p) + H{L2,p)> 1 -. (79) 


Proof. As a first step we will compute the asymptotic behavior of the bound — log 2s _i_i [c 2 ], with c given in 
(78). Expanding the central binomial coefficient in factorials and using the Stirling approximation up to order 
1 og 2s +i(s) gives 


lim -log 2s+1 [c 2 ] 


lim -log 2s+1 

s—yoo 



2s V 
[s + 1/2]/_ 


1 / 2 , 


which proves the convergence of the Maassen-Uffink bound to the right hand side of (79). 


(80) 


In order to show that this bound describes the asymptotic uncertainty region, we have to exhibit sequences 
of states saturating for every point on the boundary curve. We first show that the endpoint (0,1/2) is asymp¬ 
totically attained by the Lj-eigenstates |s): The output entropy of |s) in the Li basis is always zero, whereas 
the output entropy in the L 2 basis can be evaluated as i7(L 2 , |s)) = H{Li,Uf\s)), with I/ 3 = e~^ L3 . Because 
|s) has maximal quantum number m , the probability amplitudes of U 3 |s) are given by the last column of the 
Wigner-D Matrix U 3 = Z?(0,7r/2, 0). By expanding the Wigner-D matrix in terms of Jacobi polynomials [2], 
one can verify that TT m (U 3 \s)) = |(to|I/ 3 |s)| 2 is a binomial distribution in to, symmetric on the domain {—s, s}. 
The entropy of this distribution is — l/21og 2s+1 (27re//) + which converges to \ as s goes to infinity, 

hence (79) can be saturated: 

/m (77(7,!, |s)),JF(L 2 , |s))) = (0/). 


(81) 
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FIG. 11. Entropic uncertainty sum of the coherent state |s) (blue squares) and the state |0) (green triangles) in 
comparison to the Maassen-Uffink bound (orange circles), for integer spin (left) and half integer spin (right) 


Finally we have to construct states saturating every point of this bound. For |s), still in the eigenbasis of L\, 
and arbitrary s, we define a family of unit vectors \ij) a ) as 


1 4>a) = C a (cos(cc)|s) + sm(a)U 3 \s)). 
The Li-outcome probability distribution associated with this vector is 

7r m (t/>a) ~ cos(a) 2 S ms + sin(a) 2 |(m|t/ 3 |s)| 2 , 


(82) 


(83) 


because the two probability distributions have practically no overlap for large s, and hence also c a ss 1. For the 
same reason we can evaluate the entropy of ^(^/q.) as a sum, obtaining 


| . o / \ tt / t t t i xx #2 (cos 2 (a), sin 2 (a)) 1 . 2/ . 

H(Li, \ip a )) « sin 2 (a)JT(Li, C/ 3 s)) +--- , -« - sin 2 a , 

Iog 2 (2s + l) 2 


(84) 


where H 2 is the binary entropy function. In the L 2 -basis the roles of the two terms in ip a are exchanged, and 
we get 


H(L 2> I if>a)) ~ 7 } cos 2 (a). 

Hence the sequence i/j a realizes the point i(sin 2 (a),cos 2 (a)) on the boundary. 


(85) 

□ 


IV. MEASUREMENT UNCERTAINTY 
A. Introduction 

As mentioned in the introduction, a measurement uncertainty relation is a quantitative bound on the accuracy 
with which two observables can be measured approximately on the same device. Already in Kennard’s 1927 
paper [18] it is clearly stated that in quantum mechanics the notion of a “true value” loses its meaning, so that 
we should not think of “measurement error” as the deviation of the observed value from a true value. What we 
can always do, however, is to compare the performance of two measuring devices, one of which is an (perhaps 
hypothetical) “ideal” measurement and the other an approximate one. The only requirement is that these two 
measurements give outputs which lie in the same space X and whose distance is somehow defined. A good 
approximate measurement is then one which will give, on every input state, almost the same output distribution 
as the ideal one. This operational focus on the output distributions is also in keeping with the way one would 
detect a disturbance of the system. Consider how we discover that trying to detect through which of two slits 
the particles pass disturbs them: The interference pattern, i.e., the output distribution of the interferometer is 
changed and fringes washed out. 

Two related ways to build up a quantitative comparison of distributions, and thereby a quantitative approxi¬ 
mation measure between observables, were introduced in the papers [3, 5] and applied to the standard situation 
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of a position and a momentum operator. These two notions, called calibration error and metric error will be 
described in the following subsections. Either way we get a natural figure of merit for an observable F jointly 
measuring two or more components of angular momentum. In fact, we will only treat the case where F jointly 
measures all components. By this we simply mean an observable whose output is not a single number but a 
vector 77. From this, one derives a “marginal measurement” F e of the e-component by post-processing, i.e., by 
taking the e-component e r] of the output vector as the output of F B . These marginals can then be compared 
with the standard projection valued measurement of the angular momentum component e-L. When D(G,E) 
is the quantity chosen to characterize the error of an observable G with respect to the ideal reference E we get, 
in our special case, 


D max (F) =maxD(F e ,e-L) ( 86 ) 

e 

is the desired figure of merit. This is the quantity which we will minimize. But first we have to be more explicit 
about the two choices for the error quantity D(G,E). This will be done in the next two subsections. 


B. Calibration Error 

The simpler one assumes that the “ideal” observable is projection valued, so that we can produce states which 
have a very narrow distribution around one of its eigenvalues (or points in the continuous spectrum). In other 
words, we have some states available which come close to having a “true value” in the sense that the ideal 
distribution is sharp around a known value. A good approximate measurement should then have an output 
distribution, which is also well peaked around this value. Thus we only have to compare probability distributions 
to (5-function like distributions, i.e., point measures S x with x £ X. This is straightforward, and we set, for any 
probability measure /i on the space X, 1 < a < 00 


D a ([i . 5 X ) 


J^Kdy) D(y,x) a ^j 


(87) 


where D under the integral is the given metric on X. This could be called the power-a deviation of p from 
the point x. We are mostly interested in quadratic deviations, i.e., a — 2. However, in this section we keep a 
general, which causes no extra difficulty, but makes clear which numbers “ 2 ” arise directly from the role of the 
averaging power a in (87) and similar equations. 

We apply this now to F p the output distribution F p obtained by measuring the observable F on the input 
state p , and its ideal counterpart E p . The e-deviation or e-calibration error of the observable F with respect to 
the ideal observable E is 


A® (F, E) = sup {D a (F p , S x )\D a (E p , S x ) < e} , ( 88 ) 

where the supremum is over all x £ X and “calibration states” p, which are sharply concentrated on x up to 
quality e. Note that as a function of e this expression is decreasing as e —► 0, because the supremum is taken 
over smaller and smaller sets. Therefore the limit exists, and we define the calibration error of F with respect 
to E by 


A c a (F,E) = ]imA- a (F,E). (89) 

£—>•0 

For observables E with discrete spectrum (like angular momentum components) we can also take e = 0, in ( 88 ), 
and directly get A c a {F, E) = (F, E ). 


C. Metric error 

A possible issue with the calibration error is that it describes the performance of F only on a very special 
subclass of states. On the one hand this makes it easier to determine it experimentally, but on the other hand 
we get no guarantee about the performance of the device on general inputs. Classically this problem does not 
arise, because broad distributions can be represented as mixtures of sharply peaked ones, and this allows us to 
give an estimate also on the similarity of output distributions for general inputs. The form of this estimate gives 
a good hint towards how to define the distance of probability distributions both of which are diffuse. Indeed 
suppose p is an input state such that p = J p(dx) p x , where p x is an £-calibration state at point x, and p is an 
arbitrary probability measure. Then we can define measure 7 on X x X by 7 (dxdy) = p( y dx)F p _ c (dy), which 
gives the probability of the joint event of having a “true value” x € dx and finding y £ dy. If one integrates 
out the x variable one gets the output distribution for p, because F p is linear in p, and if one integrates out y 
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one gets y, because each E Pa , is normalized. To within e this is the output distribution E p , and with known 
calibration error we get the bound 

J 7 {dxdy) D(x,y) a = J y{dx) j F Px {dy) D{x,y) a = J y{dx)D a {F Px , S x ) a < A s a (F, E) a . (90) 

This suggests the following definitions. For two probability distributions y and v on X we define a coupling to 
be a measure 7 on X x X whose first marginal is y and whose second marginal is v. The set of couplings will 
be denoted by T(y, v), and is always non-empty because it contains the product measure. We then define the 
Wasserstein a-distance of y and v as 

D a (y, v) = inf ( [-/{dxdy) D(x,y)A . (91) 

7er (w) \J J 

This is also called a transport distance, because of the following interpretation, first seen by Caspar Monge in 
the 18th century who considered the building of fortifications. We consider y and v as some distribution of 
earth, and the task of a builder who wants to transform distribution y into distribution v. The workers are 
paid by the bucket and the power a of the distance travelled with each bucket (giving a bonus pay on long 
distances). The builder’s plan is precisely the coupling 7 saying how many units are to be taken from x to y, 
and the integral is the total cost. The infimum is just the price of the optimal transport plan. The theory of 
such metrics is well developed, and we recommend the book of Villani [32] on the subject, but in the present 
context we only need some simple observations. 

With a metric between probability distributions we define the distance of two observables as the worst case 
distance of their output distributions: 


D a (F,E) := sup D a (F p ,E p ). (92) 

p 

For the connection between this metric error and the calibration error introduced above, note first that when v 
is the point measure 6 X , and y is arbitrary the product is the only coupling, and the two definitions D a {y,S x ) 
from equations (87) and (91) coincide. Therefore, if D a {E p ,S x ) < e, we have 

D a (F p ,5 x ) < D a (F p , E p ) + D a (E p ,5 x ) < D a {F,E) +e. (93) 

By taking the supremum ( 88 ) and letting £ —> 0, we hence have 

A c a (F,E)<D a (F,E). (94) 


Intuitively, this merely indicates that for calibration we test deviations only in the small subset of highly 
concentrated states. Then (90) is a partial converse: If p has a convex decomposition into £-concentrated states, 
D a (F p ,y) < A e a {F,E), and since D a {y,E p ) <s we get D a (F p ,E p ) < A e a {F,E) + e. In the classical case such 
a decomposition always exists, so we have equality D a {F p , E p ) = A c a {F, E). In the quantum case, however, we 
not only have convex mixtures of sharply concentrated states but also coherent superpositions. Using these it 
is easy to build examples in which (94) is strict. 

There is a second “quasi-classical” setting, in which calibration and metric error coincide, and this will actually 
be used below. This is the case when F and E differ only by classical noise generated in the measuring apparatus. 
More formally this is described by a transition probability kernel P(x, dy), which is for every x the probability 
measure in y describing the output of F, given that E has been given the value x. We can think of this as 
classical probabilistic post-processing or noise. It is, of course, not necessary that F actually operates in two 
steps, but only that it could be simulated in this way, i.e., the relation F{dy) = J E{dx) P{x, dy) holds. This is 
enough to conclude A c a {F, E) = D a {F , E), and to give a formula for both in terms of the size of the noise kernel 
P. In the following Lemma the .E-essential supremum of a measurable function / with respect to a measure E 
(denoted E—esssup xeX f( x )) is the supremum of all A such that the upper level set {x\f(x) > A} has non-zero 
E-measure. In our application E is the spectral measure of a component e-L, so it is concentrated on the finite 
set {—s,..., s}. The essential supremum is then simply the maximum of f over this set. 

Lemma 9. Let E be a projection valued observable on a separable metric space (X,D). Let F be an observable 
arising from E by post-processing with a transition probability kernel P. Then, for all a, 


A c a {F,E) = D a {F,E) 


( E-ess sup 
V xex 


P(x,dy) D(x, y) c 


1/Ot 


(95) 


Proof. Let I, II, III be the three terms in this equation. Then I<II is given by (94). To show II<III, note that 
for any state p we get a coupling 7 between F p and E p by 7 (dxdy) = E p (dx)P{x , dy). Hence 


7 {dxdy) D(x,y) a = / E p (dx) 


D a {F p , E p ) a < 


P(x, dy) D(x, y) a . 
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We introduce the function f(x) = (f P(x,dy ) D(x,y) a ) 1 ^ a and split the integral with respect to E p into an 
integral over A> = {x\f(x) > t} and an integral over its complement X<, where and t > E—esssup x f(x). 
Then, by definition of the essential supremum, E p vanishes on X > and on X< the integrand is bounded by t a . 
Hence D a (F p , E p ) a < t a . Taking the supremum over p and the a th root we get D a (F, E) < t for every t >111, 
proving II<III. 

It remains to show that III<I. This time we pick a t < E—esssivp x f(x). By definition, this means that 
E({x\f(x) > t}) 7 ^ 0. Now let £ > 0. Then because we have assumed X to be separable, it is covered by a 
countable collection of £-balls B e (xi) = {y £ X\D{xi,y) < s}, i = 1,2,.... Hence due to countable additivity 
of E 


0 $ E({x\f(x) > t}) < Y^ E (B £ (xi) n {x\f(x) > t}). (96) 

i 

Hence for some z = Xi we have E(B e (z) n {x\ f(x) > t }) ^ 0. Since E is projection valued, we find a state p 
such that the probability measure E p (dx) is concentrated on this set. In particular, D a (E p ,S z ) < e. Moreover, 
m a = J E p (dx)f(x) a > t a and 


D a (F p , S z ) a = J E p (dx) J P{x,dy) D(z,y) a = J E p (dx)g{x) a , (97) 

where the second function defines a function g. We interpret these quantities as L a - norms with respect to E p , 
i.e., m = ||/|| a and D a {F pi 5 z ) = ||g|| a . Then by the triangle inequality 

D a (F p , 6 X ) > \\f\\ a - 11/ - g\\a >t-Wf- g\\ a . (98) 

To get an upper bound on ||/ — g\\ a note that the expression for ||/ — g\\% is the integral over E p {dx) of 

([j p (x,dy)D(x,y) a 'j ' - (j P(x,dy)D(z,y) a ^ ' ^ . (99) 

Again we can read the outer parenthesis as a difference of norms, namely the L Q -norms of the functions 
h x (y) = D{x,y) and h z (y) = D(z,y) with respect to integration by P(x,dy ) where x is considered a fixed 
parameter. But by the triangle inequality for the metric D we have \h x — h z \ < D(x,z) independent of y. 
Since the transition kernel P is a probability measure with respect to dy, we find that (99) is bounded above 
by (\\h x \\ a - IIMa) a < II h x - h z ||“ < D(x,z) a . Hence in (98) we have 

11 / - g\\ a a < J E p (dx)D(x, z) a < e a , (100) 

because by construction E p has support in B e (z). Combining the estimates we get D a (F p ,S z ) > t — e . The 
supremum over all calibrating states can only increase the left hand side, and on the right we use that the only 
condition on t was that t < E—e sssup x /(x), so that 

A e a (F,E) > E—essswpf(x) — e . (101) 

X 

Now III<I follows in the limit £ —> 0. □ 

In [5] a special case of this Lemma was used to show A c = D for the position and momentum marginals of 
a covariant phase space measurement. In that case the noise kernel P is even translation invariant, i.e., the 
output of the marginal observable can be simulated by just adding some state-independent noise to the output 
of the ideal position or momentum observable. Such translation invariance makes no sense in the case of angular 
momentum, since the range of the outputs ro £ {—s, ..., s} of the ideal observable is bounded. This is why the 
above generalization was needed, in which the noise can depend on the ideal output value. The reason for the 
existence of a post-processing kernel, however, will be the same as in the phase space case: the covariance of the 
joint measurement. Roughly speaking this makes the marginal corresponding to e-L invariant under rotations 
around the e-axis, which in an irreducible representation means that it must be a function of e-L. It is therefore 
crucial to argue that the optimal joint measurement is covariant, which will be done in the next section. 


D. Covariant Observables 

Consider a general observable F with outcome space X. Suppose some group G acts on X, with the action 
written ( g , x) > gx as usual. Suppose that the group also acts as a symmetry group of the quantum system. 
That is, there is a representation g > U g of G by operators U g , which are unitary or antiunitary, and satisfy the 
group law (possibly up to a phase factor). The observable F is then called covariant if U*F(S)U g = F(g~ 1 S) 
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for all g € G and every measurable set S. In other words, shifting the input state by U g will result in the entire 
output distribution shifted by g. For our purposes it will be convenient to express this in terms of an action 
F i— y T g F of G on the the set of observables: 

(T g F)(S) = U g F(g~ 1 S)U* . (102) 

Then the covariant observables are precisely those for which T g F = F for all g £ G. 

For angular momentum the group will be the rotation group with its action on the 3-vectors ( X = R 3 ). The 
representation U is then up to a factor ±1. Alternatively, we can take G as the covering group SU(2). Since 
the covariant observables are exactly the same this choice is completely equivalent. 

Covariance is certainly a reasonable condition to impose on a “good” observable, so it would make sense to 
study uncertainty relations just for these. However, there is no need for such an ad hoc restriction, because the 
minimum of uncertainty over all observables is anyway attained on a covariant one. The basic reason for this is 
that our figure of merit (86) does not single out a direction in space, so that it is invariant under the action T g . 
We therefore only have to show that there is no symmetry breaking, i.e., the symmetric variational problem has 
a symmetric solution. This will be done in the following Lemma. 

Lemma 10. For any observable F with an outcome set X = R 3 , and 1 < a < oo let 

F>ma.x( F ) = maxD Q (F e , e-L) (103) 

e 

A max (F) = maxA'(F e , e-L). (104) 

e 


Then 


1. both these functionals are invariant under the action T, and D max (F) a and A max (F) a are convex. 

2. each of the infima infp- -D max (F) and inff- A maX (F) is independent of whether it is taken over all observ¬ 
ables or just the covariant ones. 

Proof. By definition of F e we have (T R F) e = U R F R e U R . When E e denotes the spectral measure of e-L, the 
relation U R LU R = F _1 L similarly implies U R E R e U R = E e . Moreover, due to the supremum over all states, 
D a (F,E) does not change, if both observables are rotated with the same unitary. Hence 


D a ((T R F) e ),E e ) = D a (U R F R - le U* R ,U R E R ~ le U* R ) 


D^F 1 


, E* 


(105) 


Hence the supremum over e is unchanged and D max [T R F) = -D max (F). For A note that we can carry out 
the limit e —» 0 directly, because e-L has finite spectrum and states with D(E e ,S x ) small are norm-close to 
eigenstates with x = m. Hence 


A^(F e ,e-L)“ = max{D(Ffo w ,6 m r 


m, 'll) : e-L-0 = 


= max 

m,xl) 


(ip\F(dx)\tp) (e-x — m)“. 


(106) 


If we now insert the definition of T R F, rewrite the maximum over if in terms of ’if' = U R ip , and substitute 
x' = l? _1 x in the integral, we find A c a {(T R F) e ,e-E) = A c a {F R ±e ,E R le ), and again A max (T R F) = A max {F). 

Convexity for D a follows from the corresponding property of transport distances. Indeed let // = A kPk 
and v = Afci'fc be convex combinations of measures with the same weights A&, and let 7 *, be a coupling 
between \i]~ and vy ; . Then 7 = Afc 7 fc is a coupling of the convex combinations. Moreover, 

D a (g,,v) a < J -y(dxdy) D(x,y) a = ^ A fc J 7 k (dxdy) D(x,y) a . (107) 

k 


Here the first inequality holds because D a is defined as the infimum over couplings. Then if we take the infimum 
over each of the 7 ^., we get D a (y, u) a < X k D a (yk, ^fc)“- In particular, F 1-4 U a (F®, E*) is convex. Since the 
pointwise supremum of convex functions is convex, D max , as the supremum with respect to p and e, is convex. 
The same observation, applied to (106) with an additional supremum over e, shows that A max is likewise convex. 

Given any observable F we now form its average F = J dFL T R F with respect to the normalized Haar measure 
dR. Then F is covariant and D max (F) < J dR D max (T R F) = D max (F). Taking the infimum here, and using 
that F is covariant, and all covariant observables are such averages (F = F) gives the second inequality in 

inf D max (F) < inf D max (F) < inf £> max (F), (108) 

F F covariant F 


while the first follows trivially because the covariant observables are a subset. Hence the two infima are the 
same, and the same argument also applies to A max , proving the second claim. □ 
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We could have included also a statement that the infima in this Lemma are all attained. The argument for 
that is the compactness of the set of observables (in a suitable topology) and the lower semi-continuity of Z? max 
and A max which follows, like convexity, from the representation of these functionals as the pointwise supremum 
of continuous functionals. However, since we will later anyhow explicitly exhibit minimizers, we will skip the 
abstract arguments. We also remark that one of the main difficulties in the position/momentum case [5] does 
not arise here: In contrast to the group of phase space translations, the rotation group is compact, so the 
average is an integral and not an “invariant mean”, which has the potential of producing singular measures with 
some support on infinitely far away points. 

The main importance of this Lemma is to make the variational problem much more tractable. For covariant 
observables we have a fairly explicit parameterization, which allows us to explicitly compute the minimizers. 
In contrast, for the seemingly easier case of joint measurement of just two components covariance gives only a 
very weak constraint, and we were not able to complete the minimization. 

To develop the form of covariant observables, let us first consider the case when the output vectors have a 
fixed length r. A plausible value would be r = s, but we will leave this open. In this case X reduces to a 
sphere of radius r, which is a homogeneous space for the rotation group. We could thus apply the covariant 
version[29, 33] of the Naimark [22] dilation theorem to obtain a complete classification. But we do not need 
this machinery in this elementary case. Note first, that tr F(dx) is an invariant measure on the sphere, and all 
probability densities tr pF(dx) are bounded by this measure (since p is bounded). Hence, for every p there is 
a bounded probability density with respective to the uniform measure. Since this depends linearly on p, it is 
given by a bounded operator depending on x. The x-dependence is then completely resolved by covariance, and 
it is sufficient to know this density at one point, say the north pole ru. Moreover, by covariance this density 
must commute with the rotations around u , and is hence a linear combination of the eigenstates |n)(n| with 
n £ {—s,...,s}. The only choices to be made are hence the coefficients F n of this liner combination. We write 
the resulting observable in terms of its integrals with an arbitrary function h on the sphere: 

f F{dx)h(x) = (2s+ 1) j dR h{r Ru) (109) 

= (2s+ 1) f sm ^ ^ F n Ue,<f, 1 n) (n\U* e ^ h(re 9 ^) 

Here the first integral is over Haar measure on the rotation group (or SU(2)), whereas the second is expressed in 
polar coordinates on the sphere, with ee.p the corresponding unit vector, and Ug^ some rotation rotating 

the north pole u to eg </,. It does not matter which rotation we choose, because |n)(n| is invariant with respect 
to rotations around the 3-axis. The two expressions are related by introducing Euler angles on the rotation 
group and integrating out the initial rotation around the 3-axis. The normalization factor (2s + 1) is chosen so 
that the constraints on F n are exactly F n > 0 and F n = 1, i.e., the observable is represented as a convex 
combination of observables using only one fixed |n)(n| as the density. What changes when r is not fixed is 
simply that we get an additional integration over r, where the F n may also depend on r. Effectively, we get a 
probability measure F n (dr ) on {—s,..., s} x M + and the second version of (109) just becomes 

J F(dx)/i(x) = j K(dr) (2s + 1) J sm d(j> Ug ^ n}) ( n | h{re e >0 ). (110) 

The criterion for joint measurability does not depend on the full observable, but only on the marginals along 
the various directions e. It is one of the direct consequences of covariance, evident from the proof of Lemma 10, 
that D a (F e , E e ) and A “(F e ,F e ) do not depend on e. We will therefore only consider the case e = u in the 
following. In (110) this just means that we specialize to functions of the form h{x) = h\(x-u) with hi : R. —> R. 
Thus in the integrand we get hi(rx 9 ^-u) = hi(r cosO), which no longer depends on (f). We can therefore carry 
out the ^-integration. The resulting operator will commute with rotations around the 3-axis, so we can express 
it as a linear combination of operators |m)(m|: 

F l '{dx)h\{x) = |m)(m| I P(m,dx)hi(x) with 

m 

J P(m,dx)h 1 (x) = y; J F n (dr) (2s+ 1) J ^ \{m\U 9 \n)\ 2 hi(rcos9). (Ill) 

The first line establishes the connection with the premise of Lemma 9: For covariant observables the marginals 
can be simulated by an exact measurement of m, with a post-processing kernel P. Therefore, for covariant F 
we have -D max (F) = A max (F). Since by Lemma 10 the infimum of this quantity, say A m ; n (s), is the same as 
the minimization over all observables. Therefore we can state the measurement uncertainty in the forms 

D max {F ) > A min (s) and A max (F) > A min (s) (112) 

for all observables F, whether covariant or not. We will now compute A m ; n (s), and show that both minima are 
attained for a unique covariant observable. 



25 


E. Minimal Uncertainty 


While the above holds for arbitrary exponent a, we will now restrict to the standard variance case, i.e. a = 2. 
So far, we have derived that the optimal observable F is covariant, leading to the parametrization (111). In 
particular, F e arises from e-L by a transition probability kernel, so that metric and calibration error coincide. 
In the sequel, we will therefore only consider the calibration error, which is easier to evaluate. By covariance 
the calibration error A 2 (F e ,e-L) is independent of e, so we can choose e = e 3 . Observing that for discrete 
valued observables we can take e = 0 in (88), and we get from (111) the basic figure of merit 


A^F®, e-L) 2 = A(;(F 3 , F 3 ) 2 = max [ tr (\m){m\F(dr})) (e 3 • r/ — to) 2 

m J 

s °° 

= max V [ F n (dr)(2s + 1) [ \(n\Ug\m)\ 2 (r cos 9 — m) 2 . (113) 

m n=-s{ 2 

Before calculating the optimal case, we introduce the following Lemma, which provides a more manageable 
expression I(s,r,n,m) for the integral over 9, such that (113) reads 


s 00 

A®(F 3 ,F 3 ) 2 = max y J F n (dr) I (s, r, n, to) . 


Lemma 11. For s > 1 the integral I(s,r,n,m) can be written as 


(114) 


where 


I(s, r, n, to) = A s (r, n) + m 2 B s (r , n), 


A s (r,n) = 


r 2 s(s + 1) (—2 n 2 + 2 s(s + 1) — l) 
s(s + l)(2s- l)(2s + 3) 


(115) 


(116) 


and 


B s (r,n ) = 


6 n 2 r 2 — 2w(4s(s + 1) — 3) + s(s + 1) (—2r 2 + 4s(s + 1) — 3) 
s(s + l)(2s - l)(2s + 3) 


For s = 


have 


I 





1 

4 


Proof. Here we have to solve the integral 


I(s,r,n,m) = (2s + 1) J \d^(d)\ 2 {rcosO - to) 2 , 

0 


(117) 


(118) 


(119) 


where dnm{6) = (n\Ue\m) is the small Wigner d-matrix [2]. First we expand dnm{9) in terms of the Jacobi 
polynomials: 




(s + to)!(s — to)! 
(s + n)!(s — n)! 




n+m 

p(™~ n ’ m+n) (cos 9). 


( 120 ) 


In the following we use a recurrence relation for the Jacobi polynomials. This three-term relation does not hold 
for s = 1/2, so that we have to treat this case separately: The integrals I (l,r, n, ±|) can indeed be calculated 
directly from the above expression, and the results are given in the statement of the Lemma. From now on we 
assume s > 1. 

We can simplify some case distinctions by introducing k = min(s + in, s — to, s + n, s — n) and substitute the 
arising positive integers s — to, to — n, to + n according to Tab. I, which is possible due to the symmetries of the 
Wigner d-matrix [2]. Our expression then depends implicitly on ( s,m,n ) through (p, u, k): 


41 ( 0 ) 


k\(2s - k)\ 

(k + /i)l(k + v)\ 




P^ v \cos9). 


( 121 ) 
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Substituting x = cos 6 yields 


/ (s, r, n, m) 


1 

n(j,k,iu, is) J dx (1 — xY (1 + x) v P^’’ v \x)Plf l ’ l '\x){rx — m) 2 

-l 


( 122 ) 


where 


k(s , k, p, is) 


(2s + l)fc!(2s - A:)! 
2 ^ +u+1 {k + ju)!(fc + i/)! 


(123) 


This integral can be solved by expanding the factor [rx — to) 2 , using the Jacobi polynomial orthogonality 
relation [31] 



(1 -z) a (l + z) b Pfr b \z)P^ b \z)dz 


(2 a+b +i r (o + n+i)r(6 + n+i)) 
n\(a + b + 2?r + l)T(a + b + n + 1) m ' n 

'-v-' 

uj(n,a,b) 


and the recurrence relation [31] 


zPi a ’ b \z) 


_ ( &2 a ~) _ p(o.,h)( \ ,_ (2(q + n)(b + n)) p( a , b )( \ 

(a + b + 2n)(a + b + 2n + 2) " (a + b + 2n)(a + b + 2 n + 1) n_1 

"--' '-V-' 

a(n,a,b) f3(n,a,b) 

, (2(?r + l)(a + b + n + 1)) „( a ,6)/ x 

(a + b + 2n + l)(a + b + 2 n + 2) n+1 

s -v-' 

7(n,a,6) 


(124) 


(125) 


Then the expressions in the Lemma arise by simplifying the corresponding polynomials: 


I (s, r, n, to) = k(s, k, p, v) 


r 2 ( a(k , p } v ) 2 ui(k, p, v) + 0 (k, p, v) 2 u>(k - 1 , p, v) + 7 (k, p, v ) 2 tu(k + 1 , p, is)) 


— 2 rm a(k, p, v) u>(k, p, is) + m 2 u>(k, p, is) 


= A s (r, n) + m 2 B s (r, n). 


(126) 

□ 


We will use this Lemma to simplify the minimization over to. Moreover, the integral over r and the sum 
over n can be seen as taking a convex combination over two-dimensional vectors (A s (r,n), B s (r,n)). Hence 
optimizing F can be analyzed geometrically in terms of the set of such pairs (see Figs. 13). This is solved in 
the next theorem, whose results are visualized in FIG. 12. 

Theorem 12. The minimal measurement uncertainty A m i n (s) in the sense of (112) is attained at a unique 


covariant observable, for which F n (dr) is a point measure at n = s and r = r m j n , with 



r 1/2 

( 1/6 

s = 1/2 


r’min(s) = < 5/4 

and A min (s) = ^ 3/8 

s = 1 

(127) 

[ (2s - V 2 J +3 + 3) /2 

[ (s- v / 27+3 + 2) /2 

s > 1. 



Except for s = 1, the maximum over m in (114) is trivial for the optimal observable, i.e., the calibration error 
is the same for all calibration inputs to. 

Proof. We consider first the case s > 1. For arbitrary s > 1 we reformulate the problem using that A(F 3 , L 3) 2 
is a convex combination of the functions A s (r,n) and B s (r,n). Here we must find the best n as well as the 



k 

d 

V 

I) 

s + TO 

n — m 

—n — m 

II) 

s — m 

m — n 

n + m 

III) 

s + n 

m — n 

—n — m 

IV) 

s — n 

n — m 

n + m 


TABLE 1. Wigner-d matrix substituion. 
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min( s )/ s 



FIG. 12. Optimal radii r m i n (s) and measurement uncertainties A m i n according to Eq. (127). The radii are scaled by s, 
showing that for 1 < s < 5/2 the outputs of the optimal observable are vectors of modulus > s. For larger s, the output 
vectors are shorter than s, and, after a minimum around s = 27/2 we have r m i n (s)/s — > 1. In both panels, the functional 
expression valid for s > 1 is plotted in blue. 


probability distribution F n (dr ) for the worst m. We denote the convex set of all possible combinations by 
12 £ R 2 , i.e. 


12:= (a,&) 


= J F n (dr) A s (r, n) and b = J F n {dr) B a (r , n) 
conv| (A a (r\ n), B s (r , n)) r > 0, n = — s,..., s j. 


(128) 


All information about a possible observable is now contained in (a, b) £ 12. Furthermore a maximum over m is 
part of the definition of A 2 (A 3 , L3) 2 , so we can rewrite it as a functional on 12: 


( a + s 2 b if b > 0 

K : 12 —► R A'(a, b) = < a if b < 0 and s £ N . (129) 

[a + if 6 < 0 and s + \ £ N 

The problem is now to minimize the functional K(a,b). Since, for general n and s, 12 is hard to describe, we 
choose the following strategy, which is illustrated in the left panel of FIG. 13. 

We will show that, for s > 1, K takes its minimum at 

v = (A s (r min (s), s), 0) = Q (s - v / 2s+ll + 2), 0) (130) 


by constructing a line 4> which separates the set 12 from the convex level set K v := {(a, b) £ R 2 | K(a, b) < A"(v)}. 
For this we will take the line (f> to be the tangent of the curve r 1—»• (A(r, s), B(r , s)) at the point v. The normal 
u of cj) is 


/ 2\j2s T 3 2s — y/2s -f 3 T 3 
V2s 2 + 5s + 3’ 2s + 3 


(131) 


and $ := {x £ R 2 | x • u > v • u} is the half plane above </>. 
Now we show that 12 C $, i.e., 


9 s{n, r) := (A s (r, n), B s (r, n)) ■ u - (A s (r min (s), s)), 0) • u > 0, (132) 

with equality iff n = s and r = r m ; n (s). Note that the function g s (n,r ) is quadratic in r, so for verifying (132) 
it is sufficient to show that d 2 g s (n,r) > 0 and g s (s,r n ) > 0, where r n is the stationary point determined by 
d r g s (n,r n ) = 0. 

Indeed we have 


d 2 g s (n, t) = 4s 2 v / 2s + 3 — 4s 2 + 12n 2 — An 2 \/2s + 3 — 4s 

= 4((s 2 - n 2 )(V2s + 3 - 3) + 2s 2 - s). (133) 

Now for s > 3 the factor with the square root is positive, and since \n\ < s we can estimate the expression (133) 
as > 2s 2 — s > 0. For 1 < s > 3, the minimum with respect to n is assumed at n = 0, for which (133) can be 
evaluated explicitly, and shown to be positive. 
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FIG. 13. Left: Example for s = 6 and the n = s curve marked in red. Right: Intersection of K v with <j> in the s = 1 case. 



The stationary points r n can be straightforwardly computed as 


n(2s — 1) (—2s + y/2s + 3 — 3) 
r ” “ 2 (n 2 ( v / 2sT3 - 3) + s (-y/2s + 3s + s + l)) 

with 

2 2 

, s _ s -n _ 

9sVl ' n) s (s v / 2s"+3 - s - 1) - n 2 (v/27+3 - 3)' 


(134) 


(135) 


The denominator of g s (n,r n ) in (135) is again the quadratic coefficient of the expression (132), i.e., (133) which 
was already shown to be positive. Hence fl C 4>. 

Finally we have to certify that K v n = {v}. Using the gradient of <t> and comparing it to the linear 
boundaries of K v we get the conditions 


1 + \J2 s + 3 1 

-4 <--- <- 

(1 + s) 2 s 2 

1 + v / 27+3 1 

7i l \2 ^ —2 

(1 + s) 2 s 2 


for s + - S N 
for s £ N 


(136) 


which are true for s > 1. This concludes the proof for s > 1. 

For s = 1, this last step of the above proof fails. Indeed, the level set K v which was determined by taking that 
point v on the horizontal axis which is also on the boundary of fl does intersect f2, as can be seen in FIG. 13. 
We therefore have to take a level set of K for a slightly smaller value. Since the tangents of the level sets are 
all the same for b > 0, we can readily find the level set which is tangent to H. This gives the optimal radius 
v min(^ = 1) = 5/4, and the 


v = (A s (r min (s),s),B s (r min {s), s)) = (5/16,1/16). (137) 

Analogous to the above arguments, one verifies easily that {v} = K(y) (~l H. 

Finally, for s = 1/2 one can draw the conclusion directly from the form of 7(1/2, r, n, m) given in Lemma 11: 
This expression does not depend on m, and has a unique global minimum at r m ; n (l/2) = 1/2 and n = +1/2, 
where the optimal probability measure F must therefore be concentrated. 

In all cases, the optimal value A m i n (s) is computed by substituting the obtained optimal r nlln (s) and n = s 
in (114). “ □ 


V. CONCLUSIONS AND OUTLOOK 

Uncertainty relations can be built for any collection of observables. In this paper we provided some methods, 
which work in a general setting, but chiefly looked at angular momentum as one of the paradigmatic cases of 
non-commutativity in quantum mechanics. 

The basic mathematical methods are well-developed for the case of preparation uncertainty, so that even in a 
general case the optimal tradeoff curves can be generated efficiently. We resorted to numerics quite often, since 
it turns out that the salient optimization problems can rarely be solved analytically for general s. One of the 
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features one might hope to settle analytically in the future is the asymptotic estimate C 2 (s) oc s 2 / 3 which comes 
out with a precision that suggests an exact result. 

Much is left to be done for entropic uncertainty. Here we gave only some basic comparisons to the variance 
case. It would be interesting to see whether the entropic relations can be refined to the point that they can be 
used to derive sharp variance inequalities as Hirschman did in the phase space case [10]. 

For measurement uncertainty the general situation is not so favourable, perhaps due to the much more recent 
introduction of the subject. At this point we know of no efficient way to derive sharp bounds for generic pairs of 
observables. Nevertheless, we were able to treat the case of a joint measurement of all components in arbitrary 
directions, because in this case rotational symmetry is not broken and leads to considerable simplification. 
One of these simplifications is the observation that the two basic error criteria, namely metric uncertainty and 
calibration error lead to the same results. This was already familiar from the phase space case. However, a 
further simplification one might have expected from this analogy definitely does not hold: There seems to be no 
quantitative link between preparation and measurement uncertainty for angular momentum. Further research 
will show whether useful general connections between the two faces of the uncertainty coin can be established. 

The limit large s —> oo can be understood as a mean field limit [24], when the spin-s representation is 
considered as 2s copies of a spin-1/2 system in a symmetric state. We can also see this as a classical limit H 0 
[34], in the sense that the angular momentum in physical units, i.e., H is fixed, and hence the dimensionless 
half-integral representation parameter s has to diverge. This offers a way to treat not just the uncertainty 
aspects of this limit, but also the limit of the whole theory of angular momentum. 
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